1,799 research outputs found

    Driving Markov chain Monte Carlo with a dependent random stream

    Full text link
    Markov chain Monte Carlo is a widely-used technique for generating a dependent sequence of samples from complex distributions. Conventionally, these methods require a source of independent random variates. Most implementations use pseudo-random numbers instead because generating true independent variates with a physical system is not straightforward. In this paper we show how to modify some commonly used Markov chains to use a dependent stream of random numbers in place of independent uniform variates. The resulting Markov chains have the correct invariant distribution without requiring detailed knowledge of the stream's dependencies or even its marginal distribution. As a side-effect, sometimes far fewer random numbers are required to obtain accurate results.Comment: 16 pages, 4 figure

    A nonparametric HMM for genetic imputation and coalescent inference

    Full text link
    Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data

    Random Tessellation Forests

    Full text link
    Space partitioning methods such as random forests and the Mondrian process are powerful machine learning methods for multi-dimensional and relational data, and are based on recursively cutting a domain. The flexibility of these methods is often limited by the requirement that the cuts be axis aligned. The Ostomachion process and the self-consistent binary space partitioning-tree process were recently introduced as generalizations of the Mondrian process for space partitioning with non-axis aligned cuts in the two dimensional plane. Motivated by the need for a multi-dimensional partitioning tree with non-axis aligned cuts, we propose the Random Tessellation Process (RTP), a framework that includes the Mondrian process and the binary space partitioning-tree process as special cases. We derive a sequential Monte Carlo algorithm for inference, and provide random forest methods. Our process is self-consistent and can relax axis-aligned constraints, allowing complex inter-dimensional dependence to be captured. We present a simulation study, and analyse gene expression data of brain tissue, showing improved accuracies over other methods.Comment: 11 pages, 4 figure

    Modeling Population Structure Under Hierarchical Dirichlet Processes

    Get PDF
    We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods

    Genome-Wide Association with Uncertainty in the Genetic Similarity Matrix

    Get PDF
    Genome-wide association studies (GWASs) are often confounded by population stratification and structure. Linear mixed models (LMMs) are a powerful class of methods for uncovering genetic effects, while controlling for such confounding. LMMs include random effects for a genetic similarity matrix, and they assume that a true genetic similarity matrix is known. However, uncertainty about the phylogenetic structure of a study population may degrade the quality of LMM results. This may happen in bacterial studies in which the number of samples or loci is small, or in studies with low-quality genotyping. In this study, we develop methods for linear mixed models in which the genetic similarity matrix is unknown and is derived from Markov chain Monte Carlo estimates of the phylogeny. We apply our model to a GWAS of multidrug resistance in tuberculosis, and illustrate our methods on simulated data

    Path Selection for Quantum Repeater Networks

    Full text link
    Quantum networks will support long-distance quantum key distribution (QKD) and distributed quantum computation, and are an active area of both experimental and theoretical research. Here, we present an analysis of topologically complex networks of quantum repeaters composed of heterogeneous links. Quantum networks have fundamental behavioral differences from classical networks; the delicacy of quantum states makes a practical path selection algorithm imperative, but classical notions of resource utilization are not directly applicable, rendering known path selection mechanisms inadequate. To adapt Dijkstra's algorithm for quantum repeater networks that generate entangled Bell pairs, we quantify the key differences and define a link cost metric, seconds per Bell pair of a particular fidelity, where a single Bell pair is the resource consumed to perform one quantum teleportation. Simulations that include both the physical interactions and the extensive classical messaging confirm that Dijkstra's algorithm works well in a quantum context. Simulating about three hundred heterogeneous paths, comparing our path cost and the total work along the path gives a coefficient of determination of 0.88 or better.Comment: 12 pages, 8 figure

    A saposin deficiency model in Drosophila: Lysosomal storage, progressive neurodegeneration and sensory physiological decline

    Get PDF
    Saposin deficiency is a childhood neurodegenerative lysosomal storage disorder (LSD) that can cause premature death within three months of life. Saposins are activator proteins that promote the function of lysosomal hydrolases that mediate the degradation of sphingolipids. There are four saposin proteins in humans, which are encoded by the prosaposin gene. Mutations causing an absence or impaired function of individual saposins or the whole prosaposin gene lead to distinct LSDs due to the storage of different classes of sphingolipids. The pathological events leading to neuronal dysfunction induced by lysosomal storage of sphingolipids are as yet poorly defined. We have generated and characterised a Drosophila model of saposin deficiency that shows striking similarities to the human diseases. Drosophila saposin-related (dSap-r) mutants show a reduced longevity, progressive neurodegeneration, lysosomal storage, dramatic swelling of neuronal soma, perturbations in sphingolipid catabolism, and sensory physiological deterioration. Our data suggests a genetic interaction with a calcium exchanger (Calx) pointing to a possible calcium homeostasis deficit in dSap-r mutants. Together these findings support the use of dSap-r mutants in advancing our understanding of the cellular pathology implicated in saposin deficiency and related LSDs

    Geographically touring the eastern bloc: British geography, travel cultures and the Cold War

    Get PDF
    This paper considers the role of travel in the generation of geographical knowledge of the eastern bloc by British geographers. Based on oral history and surveys of published work, the paper examines the roles of three kinds of travel experience: individual private travels, tours via state tourist agencies, and tours by academic delegations. Examples are drawn from across the eastern bloc, including the USSR, Poland, Romania, East Germany and Albania. The relationship between travel and publication is addressed, notably within textbooks, and in the Geographical Magazine. The study argues for the extension of accounts of cultures of geographical travel, and seeks to supplement the existing historiography of Cold War geography

    Leaf litter decomposition -- Estimates of global variability based on Yasso07 model

    Full text link
    Litter decomposition is an important process in the global carbon cycle. It accounts for most of the heterotrophic soil respiration and results in formation of more stable soil organic carbon (SOC) which is the largest terrestrial carbon stock. Litter decomposition may induce remarkable feedbacks to climate change because it is a climate-dependent process. To investigate the global patterns of litter decomposition, we developed a description of this process and tested the validity of this description using a large set of foliar litter mass loss measurements (nearly 10 000 data points derived from approximately 70 000 litter bags). We applied the Markov chain Monte Carlo method to estimate uncertainty in the parameter values and results of our model called Yasso07. The model appeared globally applicable. It estimated the effects of litter type (plant species) and climate on mass loss with little systematic error over the first 10 decomposition years, using only initial litter chemistry, air temperature and precipitation as input variables. Illustrative of the global variability in litter mass loss rates, our example calculations showed that a typical conifer litter had 68% of its initial mass still remaining after two decomposition years in tundra while a deciduous litter had only 15% remaining in the tropics. Uncertainty in these estimates, a direct result of the uncertainty of the parameter values of the model, varied according to the distribution of the litter bag data among climate conditions and ranged from 2% in tundra to 4% in the tropics. This reliability was adequate to use the model and distinguish the effects of even small differences in litter quality or climate conditions on litter decomposition as statistically significant.Comment: 19 Pages, to appear in Ecological Modellin
    corecore